Spectral Clustering Strategies for Heterogeneous Disease Data

نویسندگان

Grace T. Huang

Kathryn I. Cunningham

Panayiotis V. Benos

Chakra Chennubhotla

چکیده

Clustering of gene expression data simplifies subsequent data analyses and forms the basis of numerous approaches for biomarker identification, prediction of clinical outcome, and personalized therapeutic strategies. The most popular clustering methods such as K-means and hierarchical clustering are intuitive and easy to use, but they require arbitrary choices on their various parameters (number of clusters for K-means, and a threshold to cut the tree for hierarchical clustering). Human disease gene expression data are in general more difficult to cluster efficiently due to background (genotype) heterogeneity, disease stage and progression differences and disease subtyping; all of which cause gene expression datasets to be more heterogeneous. Spectral clustering has been recently introduced in many fields as a promising alternative to standard clustering methods. The idea is that pairwise comparisons can help reveal global features through the eigen techniques. In this paper, we developed a new recursive K-means spectral clustering method (ReKS) for disease gene expression data. We benchmarked ReKS on three large-scale cancer datasets and we compared it to different clustering methods with respect to execution time, background models and external biological knowledge. We found ReKS to be superior to the hierarchical methods and equally good to K-means, but much faster than them and without the requirement for a priori knowledge of K. Overall, ReKS offers an attractive alternative for efficient clustering of human disease data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectral Clustering in Heterogeneous Networks

Many real-world systems consist of several types of entities, and heterogeneous networks are required to represent such systems. However, the current statistical toolbox for network data can only deal with homogeneous networks, where all nodes are supposed to be of the same type. This article introduces a statistical framework for community detection in heterogeneous networks. For modeling hete...

متن کامل

Multi-View K-Means Clustering on Big Data

In past decade, more and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the rich information among multiple views. Several multi-view clustering methods ...

متن کامل

Scalable Spectral Clustering with Weighted PageRank

In this paper, we propose an accelerated spectral clustering method, using a landmark selection strategy. According to the weighted PageRank algorithm, the most important nodes of the data affinity graph are selected as landmarks. The selected landmarks are provided to a landmark spectral clustering technique to achieve scalable and accurate clustering. In our experiments with two benchmark fac...

متن کامل

Machine Learning Approaches to Link-Based Clustering

We have reviewed several state-of-the-art machine learning approaches to different types of link based clustering in this chapter. Specifically, we have presented the spectral clustering for heterogeneous relational data, the symmetric convex coding for homogeneous relational data, the citation model for clustering the special but popular homogeneous relational data – the textual documents with...

متن کامل

Spectral Clustering for Complex Settings

of the Dissertation Spectral Clustering for Complex Settings Many real-world datasets can be modeled as graphs, where each node corresponds to a data instance and an edge represents the relation/similarity between two nodes. To partition the nodes into different clusters, spectral clustering is used to find the normalized minimum cut of the graph (in the relaxed sense). As one of the most popul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Spectral Clustering Strategies for Heterogeneous Disease Data

نویسندگان

چکیده

منابع مشابه

Spectral Clustering in Heterogeneous Networks

Multi-View K-Means Clustering on Big Data

Scalable Spectral Clustering with Weighted PageRank

Machine Learning Approaches to Link-Based Clustering

Spectral Clustering for Complex Settings

عنوان ژورنال:

اشتراک گذاری